The supplementary meterial consists of an lptorch package that quantizes activaiton, weight, error, gradient, and master weight,
and an example quantized model & training code for easy understanding of the package.

Supplementary meterial
    - lptorch package
        Pytorch cuda extension for low-precision training
        More explanation can be found below.
    - qmodel
        - LSTM2.py
            quantized 2-layer LSTM model
        - mobilenet.py
            quantized MobileNetV2 model
        - resnet.py
            quantized ResNet model
    - mobilenetv2.py
        training code for MobileNetV2 on ImageNet
    - resnet18.py
        training code for ResNet-18 on ImageNet
    - word_language_model.py
        training code for 2-layer LSTM on PTB
    - utils.py
        Other codes needed for training.
        It contains functions to save / load the model, progress bar, and so on.

How to use lptorch package

Installation
    Requirements : PyTorch package, nvcc
    installation
        cd 'lptorch package'
        make run # or python setup.py install

Qauntization method
    format class
        - lptorch.quant.linear_format(bit_num: int)
            Quantization format for block floating point such as flexpoint [1]
        - lptorch.quant.custom_fp_format(man: list)
            Quantization format for arbitrary format with shared exponent bias. 
            'man' is significand list in equation (8)
        - lptorch.quant.fp_format(exp_bit: int, man_bit: int, bias: int(default: None))
            Quantization format such as FP16 with fixed exponent bias, not moving shared exponent bias.
            'bias' means exponent bias in floating point
    quant class
        a class that stores quantization format(format class), whether to apply stochastic round, etc.
        lptorch.quant.quant(qformat: format class, room: integer(default: 0), tracking: bool(default: True), stochastic: bool(default: False), ch_wise: bool(default: False))
            - room
                When track the shared exponent bias(custom_fp_format) or shared exponent(linear_format) through the information of the previous batch,
                set target of shared exponent bias as log2(maximum_abs_val)+room, so room means extra-space to avoid overflow.
            - tracking
                If tracking is True, shared exponent bias is determined using pre-batch's maximum value information.
                On the other hand, if tracking is False, shared exponent bias is set exactly for every quantization, using maximum value of current data. 
            - stochastic
                If stochastic is True, quantization is performed stochastically.
                For example, a value 0.3 is rounded up (value 1) with a probability of 0.3 and rounded down (value 0) with a probability of 0.7.
            - ch_wise
                If ch_wise is True, in the case of convolution weight, each output channel has a different shared exponent bias.
    Qauntization setting
        In lptorch, there are five types of quantization. error, activation, weight, weight gradient, and master weight.
        So, if you want to quantize any of them, you need to set the format for each type.
        - lptorch.set_error_quant(quant class)
        - lptorch.set_activ_quant(quant class)
        - lptorch.set_weight_quant(quant class)
        - lptorch.set_grad_quant(quant class)
        - lptorch.set_master_quant(quant class)
        Also, if you want to use hysteresis quantization for weight, you need to use below code.
        - lptorch.set_hysteresis_update(True) # default value is False

Build quantized network
    In lptorch, there are building blocks to make quantized network.
    - lptorch.nn.QLayer(module: torch.nn.module(default: None), function: lptorch.nn.F(default: None), 
                        dual: boolean list(default: [False, False]), fixed_scale: int list(default: [None, None]), 
                        last: boolean(default: False), tracking: boolean list (default: [True, True]),
                        quantize: boolean list(default: [True, True]))
        In QLayer, the forward function proceeds in the following order. 
        input -> forward function of module -> function -> quantize with activ_quant -> quantized output
        In the case of backward function, 
        gradient output -> quantize with error_quant -> backward function of function -> backward function of module -> gradient input
        With this QLayer, we can use pytorch modules such as torch.nn.Conv2d with minimum quantization overhead.
        If you want to quantize inside a Pytorch module, don't use QLayer.
        - dual
            If you need higher precision for specific layer, you can set dual True.
            If dual is True, effective precision become twice.
            It is used for first layer's input (For example, RGB value of image)
            First element of dual is for forward path (activation), and second element of dual is for backward path (error).
        - fixed_scale
            If you want fixed shared exponent bias or exponent bias, you can set fixed scale using this.
        - last
            Indicate last layer of quantized module.
        - tracking
            You can choose to track shared exponent bias for specific layer.
        - quantize
            You can turn-off quantization function for specific layer. 
    - lptorch.nn.NQLayer(module: torch.nn.module(default: None), function: lptorch.nn.F(default: None), last: boolean(default: False))
        In NQLayer, the forward function proceeds in the following order.
        input -> forward function of module -> function -> not-quantized output
        In the case of backward function,
        gradient output -> backward function of function -> backward function of module -> gradient input
    - lptorch.nn.QAdd(last: boolean(default: False))
        QAdd performs quantized addition.
    - lptorch.nn.QClone(last: boolean(default: False))
        QClone performs quantized addition in backward stage.
    - lptorch.nn.LSTM
        Qauntized version of LSTM.
        In the case of LSTM, QLayer cannot be used because it is necessary to quantize the inside of the module.
    - lptorch.nn.GRU
        Qauntized version of GRU.

Qauntized optimizer
    In lptorch, optimizer performs weight, weight gradient, and master weight quantization.
    So, if you want to quantize these values, you have to use lptorch optimizer.
    Available optimizers : SGD, RMSprop, fair_Adam, Adam, AdamW
    
    lptorch.optim.optimizer_type(same parameters as Pytorch..., weight_quantize: boolean(default: True), quant: quant class(default: None))
        - weight_quantize
            If weight_quantize is False, optimizer does not quantize weight.
            It is used for batchnorm parameters.
        - quant
            You can set weight quantization format different from lptorch.set_weight_quant.
            If quant is None(default value), weight is quantized with lptorch.set_weight_quant.

[1] Köster, Urs, et al. "Flexpoint: An adaptive numerical format for efficient training of deep neural networks." arXiv preprint arXiv:1711.02213 (2017).